BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

نویسندگان

  • Shihao Ji
  • S. V. N. Vishwanathan
  • Nadathur Satish
  • Michael J. Anderson
  • Pradeep Dubey
چکیده

We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a weighted sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to view it as an extension of the DropOut strategy to the output layer, wherein we use a discriminative training loss and a weighted sampling scheme. We also establish close connections between BlackOut, importance sampling, and noise contrastive estimation (NCE). Our experiments, on the recently released one billion word language modeling benchmark, demonstrate scalability and accuracy of BlackOut; we outperform the state-of-the art, and achieve the lowest perplexity scores on this dataset. Moreover, unlike other established methods which typically require GPUs or CPU clusters, we show that a carefully implemented version of BlackOut requires only 1-10 days on a single machine to train a RNNLM with a million word vocabulary and billions of parameters on one billion of words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prefix tree based n-best list re-scoring for recurrent neural network language model used in speech recognition system

Recurrent Neural Network Language Model (RNNLM) has recently been shown to outperform N-gram Language Models (LM) as well as many other competing advanced LM techniques. However, the training and testing of RNNLM are very time-consuming, so in real-time recognition systems, RNNLM is usually used for re-scoring a limited size of n-best list. In this paper, issues of speeding up RNNLM are explore...

متن کامل

Improving Language Modelling with Noise-contrastive estimation

Neural language models do not scale well when the vocabulary is large. Noise contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, its full potential has not been demonstrated in the language modelling literature. A sufficient investigation of the hyperparameters ...

متن کامل

Multi-Language Neural Network Language Models

In recent years there has been considerable interest in neural network based language models. These models typically consist of vocabulary dependent input and output layers and one, or more, hidden layers. A standard problem with these networks is that large quantities of training data are needed to robustly estimate the model parameters. This poses a challenge when only limited data is availab...

متن کامل

Character-based Neural Machine Translation

Neural Machine Translation (MT) has reached state-of-the-art results. However, one of the main challenges that neural MT still faces is dealing with very large vocabularies and morphologically rich languages. In this paper, we propose a neural MT system using character-based embeddings in combination with convolutional and highway layers to replace the standard lookup-based word representations...

متن کامل

Neural Network Language Modeling with Letter-based Features and Importance Sampling

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks. We combine the use of subword features (letter ngrams) and one-hot encoding of frequent words so that the models can handle large vocabularies containing infrequent words. We propose a new objective function tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.06909  شماره 

صفحات  -

تاریخ انتشار 2015